Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New MVA-based Tau-Ids #108

Closed
wants to merge 5 commits into from

Conversation

mbluj
Copy link

@mbluj mbluj commented Feb 8, 2018

The MVA-based Tau-IDs are going to be retrained with 94X Fall17 samples and then should be evaluated using MiniAOD Taus (skimmedTaus) and added to NanoAOD.

This is an initial implementation, based on MVA training with 92X Summer17 samples, mentioned in the issue #107. The idea is that you start revision from this initial implementation which will be updated when new MVA trainings are provided.

Note: This PR bases on CMSSW_9_4_4 which looks be newer than the master branch of NanoAOD repository. Therefore, the PR consists of a set of unrelated commits - only ones dated for 8/2/2018 cover work done here. The commits change two files:

  • RecoTauTag/Configuration/python/loadRecoTauTagMVAsFromPrepDB_cfi.py with definition of MVA payloads (cherry-picked from a PR to official CMSSW), and
  • PhysicsTools/NanoAOD/python/taus_cff.py where new Tau-Ids are evaluated and added to the Tau table

FYI, @veelken @steggema @ohlushch

@arizzi
Copy link

arizzi commented Feb 8, 2018

which branch/release you started from?
it looks there is a lot of unrelated stuff in this PR

@gpetruc should we merge 944 into master?

@mbluj
Copy link
Author

mbluj commented Feb 8, 2018

I have started with 944 then merged nanoAOD master like this
git cms-merge-topic cms-nanoAOD:master
I have checked that my changes are only ones when compared to official-cmssw CMSSW_9_4_X branch, so I think that updating master of nanoAOD to 944 (or 94X) should do the trick.

@gpetruc
Copy link

gpetruc commented Feb 8, 2018

@arizzi : yes, every time we update the twiki recipe to a new release we must first merge that release into nanoAOD:master so that the PRs don't contain other stuff making the rebases to master-cmsswmaster more tricky

@arizzi arizzi changed the base branch from master to fixLHE_94X February 8, 2018 16:06
@arizzi arizzi changed the base branch from fixLHE_94X to master February 8, 2018 16:06
@arizzi
Copy link

arizzi commented Feb 8, 2018

I changed the base back and forth so it recomputed the list of commits

@@ -63,6 +147,7 @@ def _tauId6WPMask(pattern,doc):
idAntiEle = _tauId5WPMask("againstElectron%sMVA6", doc= "Anti-electron MVA discriminator V6"),
idMVAnewDM = _tauId6WPMask( "by%sIsolationMVArun2v1DBnewDMwLT", doc="IsolationMVArun2v1DBnewDMwLT ID working point"),
idMVAoldDM = _tauId6WPMask( "by%sIsolationMVArun2v1DBoldDMwLT", doc="IsolationMVArun2v1DBoldDMwLT ID working point"),
idMVAoldDMnew = _tauId7WPMask( "by%sIsolationMVArun2v1DBoldDMwLTNew", doc="IsolationMVArun2v1DBoldDMwLT ID working point (New)"),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could 'new' become '2017' or '94X' or anything more specific? ("new" is a relative concept)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and will correct "new"-> "2017v1". Of course at some point it will be updated to something different (2017 will become default without any suffix and other "new" will appear), but hopefully users will deal with it.

@gpetruc-bot
Copy link

@gpetruc
Copy link

gpetruc commented Feb 8, 2018

Hi @mbluj , when you say some commits are cherry-picked from the CMSSW PR, is it from cms-sw#21977 ?
Then we will get duplicated commits once that one is merged in CMSSW.
If it's just the first two commits of the PR, then maybe it would be better to make the nanoAOD PR starting from those 2 commits (not cherry-picked) and just add 48d3c4b on top of them.

@arizzi
Copy link

arizzi commented Feb 8, 2018

another possibility is that the additional definition from RecoTauTag/Configuration/python/loadRecoTauTagMVAsFromPrepDB_cfi.py
are (temporarly) copied to another file in nanoaod realm

@gpetruc-bot
Copy link

Please update PhysicsTools/NanoAOD/python/nanoDQM_cfi.py: take this patch or run prepareDQM.py -d -u nano_file_mc.root, and then if needed adjust the plot range using some human common sense.

@mbluj
Copy link
Author

mbluj commented Feb 8, 2018

In fact the commits are cherry-picked from other earlier PR already merged to CMSSW master (part of 100X) while the ones in cms-sw#21977 (for 94X) are also cherry-picked from the original PR.
I can do what you propose, but I need your guidance as it is beyond my git experience :(

Concerning DQM, yes sure,
but all this needs wait until tomorrow.

Copy link

@gpetruc-bot gpetruc-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automatic test report for 299840

Code integration

Code checks not run for this PR (no source files modified)

Please update PhysicsTools/NanoAOD/python/nanoDQM_cfi.py: take this patch or run prepareDQM.py -d -u nano_file_mc.root, and then if needed adjust the plot range using some human common sense.

Tests

  • Long test data80X (10000 events): passed, no significant changes; dqm plots: all, diff
  • Long test data80Xhip (3000 events): passed, no significant changes; dqm plots: all, diff
  • Long test data94X (10000 events): passed, no significant changes; dqm plots: all, diff
  • Long test mc80X (10000 events): passed, no significant changes; dqm plots: all, diff
  • Long test mc94X (10000 events): passed, no significant changes; dqm plots: all, diff
  • Test mc_94X: passed
  • Test mc_80X: passed
  • Test mc_92X: passed
  • Test data_92X: passed
  • Test data_80X: passed

Disk size report

Sample kb/event ref kb/event diff
TTbar MC 94X 1.514 1.509 0.004 ( +0.3% )
TTbar MC 80X 1.561 1.557 0.003 ( +0.2% )
Data 80X 0.579 0.577 0.002 ( +0.3% )
Data 80X, Mu Run2016E 0.510 0.507 0.003 ( +0.7% )

@mbluj
Copy link
Author

mbluj commented Feb 9, 2018

Hello,
if you still want original commits with the updated payloads as in cms-sw#21977 I think you can switch this PR to [mbluj:]CMSSW_9_4_X_MVA2017v1ForNano_2, but please check if it is what you asked for.

P.S. I can send other PR and close this one if it is not possible to change the head branch of a PR.

@veelken
Copy link

veelken commented Feb 12, 2018

Dear Andrea and Giovanni,

it would be very helpful if this PR can be merged soon.
Would be great if you could briefly summarize what still needs to be done before the PR can be merged.

Thank you,

Christian

@mbluj
Copy link
Author

mbluj commented Feb 12, 2018

Sorry, Christian, but I think that this PR waits until final MVA training are there. It is as I think that there is not any official production of NanoAOD expected before MiniAODv2 (with non-existing CMSW_945 or newer) which in principle should consists of MVAIso 2017v1 by default, so not a real use case. But please correct me if I am wrong.

@arizzi arizzi added the HOLD label Mar 14, 2018
@arizzi
Copy link

arizzi commented Mar 20, 2018

@mbluj we need to converge this week on a version of NANOAOD to produce in chain with MiniAODV2, starting from MiniAODV2 inputs. Can you clairfy how you plan to include the latest tauID?

@gpetruc
Copy link

gpetruc commented Mar 21, 2018

It's not clear to me how we want to proceed now given that for 94X miniAOD v2 we will have only the 2017v1 ID while for 94X miniAOD v1, in principle we can have both 2017 and 2016.
I see a few options:

  1. in both miniAOD v1 and v2 we always only provide the 2017v1 id, with name idMVAoldDM
  2. in both miniAOD v1 and v2 we always only provide the 2017v1 id, with name idMVAoldDM2017v1
  3. in miniAOD v2 we provide the 2017v1 id with name idMVAoldDM2017v1, in miniAOD v1 we provide both the 2017v1 ID with name idMVAoldDM2017v1 and the older ID with name idMVAoldDM
  4. in miniAOD v2 we provide the 2017v1 id with name idMVAoldDM; in miniAOD v1 we provide both the 2017v1 ID with name idMVAoldDM and the older ID with some other name (e.g. idMVAoldDM2016)

Between 1 and 2, it's a matter of whether you want to force people to be aware of the new ID or not. Given that the bitmasks also change because of the new VVL WP, I would prefer option 2.

Between the options 1-2 vs 3-4 boils down to whether there are ongoing analysis efforts using the old ID that should not be disrupted (I hope the answer is no)

In any case, this PR should be re-opened based on [mbluj:]CMSSW_9_4_X_MVA2017v1ForNano_2 so that the commits in common with 9_4_5_cand1 will not be duplicated. Also, it should be modified to use the new run2_nanoAOD_94XMiniAODv1 era to load the 2017v1 ID on old miniAODs.

@mbluj
Copy link
Author

mbluj commented Mar 21, 2018

Hello,
I am away this week with rather loose internet connection, so I cannot do much.
When I created this PR I hoped that MVAIso2017v1 would be replaced in it by the final MVAIso2017v2, but apparently it is not possible due to some issues with a new training. I am adding @steggema who can comment.
Concerning options listed by Giovanni: I also vote for option 2, but I would like that decision is taken by POG conveners, so again @steggema should comment.
BTW, In principle there is also 5th option that for both v1 and v2 two MVAIso's are provided. However it will require some additional coding and playing with eras (one can use existing ones) as for v1 MVAIso2017v1 need to be evaluated while for v2 MVAIsoOld need to be evaluated...

@arizzi
Copy link

arizzi commented Mar 22, 2018

@steggema we need to make an AN release this week in order to process the MiniAOD v2 immediately and create nanoaod out of those.
Is there someone from the TAU POG available to sort out this in the next ~24h ?

@isobelojalvo
Copy link

@arizzi, @steggema is out this week. We have MVAIso2017v2 produced but not yet uploaded to the DB. Unfortunately it would take a few days to sort out and integrate. If it is possible to wait so it could be integrated then the Tau POG would have a preference to do that. However, if not, it is not so bad to re-run at miniAOD. I suppose it would require then a new version of nanoAOD. Apologies for not watching this thread more closely.

@mbluj
Copy link
Author

mbluj commented Mar 26, 2018

@arizzi @gpetruc, where we stand with this PR? Do we have still time to try upgrade it to MVA2017v2? If yes, we will try to provide it by tomorrow early afternoon (say by about lunch time). If not we will try to finish with option 2 with a setup giving same output NanoAOD with both MiniAODv1 and v2. Implementation of the latter should be quicker than the former, hopefully by this afternoon (however not fully granted).

@mbluj
Copy link
Author

mbluj commented Mar 27, 2018

Hello,
it is just to let you know that I'm working in update of this PR with new MVA tau-id, I hope be done today by 4pm.
The starting point for me is CMSSW_9_4_5_cand1 and master of NanoAOD.

@gpetruc gpetruc closed this Mar 27, 2018
@mbluj mbluj deleted the CMSSW_9_4_X_MVA2017v1ForNano branch September 7, 2018 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

6 participants